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ABSTRACT 


GenereLL  additive  functions  called  rewards  are  defined  on  a  "regular" 

finite-state  Markov-renewal  process.  The  asymptotic  form  of  the  mean 

$ 

total  reward  in  [0,t]  has  previously  been  obtained,  and  it  Is  known  that 
the  total  rewards  are  Joint -normally  distributed  as  t  -►  ».  This  i)aper 
finds  the  dominant  asymptotic  term  in  the  covariance  of  the  total  rewstrds 
as  a  simple  function  of  the  moments  of  the  per-transltion  rewards,  and  the 
"bias"  term  of  the  mean  total  rewards.  Special  formulas  for  the  dominant 
covariance  term  of  "number  of  visits",  and  "occupation  time"  in  given 


states  are  also  derived.. 


LIMITING  COVARIANCE  IN  M/JQ(OV  -  RENEWAL  PROCESSES 

Consider  a  finite-state  Markov-renewal  process^^^  which  moves  through 
states  -  0.<  S^  <  ...  S^^  <  <  ... 

If  a  reward  is  earned  during  each  transition  from  state  to  state,  and  if 
successive  rewards  are  additive,  It  is  of  interest  to  study  the  total  reward 
earned  during  the  interval  [0,t].  Typically,  the  reward  earned  during  a 
transition  from  i^^  to  i^_^^  might  be  a  random  veirlable  which  depends 

as  well  as  on  the  "excess 
time"  t-S^  (Sj^  ^  <  \+l^  uncompleted  transition. 

Thus,  the  total  reward  earned  in  [0,t]  is  a  random  sum  of  additive 
random  variables,  and  has  a  well-defined,  thcugt  complicated  distribution. 
The  purpose  of  this  paper  is  to  svunmsrlze  some  known  results  on  the 
asymptotic  form  of  the  mean  total  reward,  and  to  present  some  new  results 
on  the  dominant  apymptotic  terra  of  the  (co-)vartance  of  the  totcuL  reward. 
These  results  are  useful  primarily  because  a  centr.-Jl  limit  theorem  often 
holds  for  the  distribution  of  total  reward,  as  t  -♦  oo  . 


upon  the  values  of  1^  ,  S 


DEFINITIONS,  NOTATION,  AND  SUMMARY  OF  RESULTS 

The  definition  of  a  Markov-renewal  process  is  that : 

■*  '  ®k+l  ^  ^  \  I  °  ^  ’  ^k-l’”'H  '  ^0  '  ®k’  ®k-l>-"®l' 


r  k=0,l,2,. .  'I 
|i, J=l,2,. . .M  I 

''  X  ^  0 


In  other  words,  the  process  may  be  considered  as  an  imbedded  Markov 
chain  in  which  the  movement  between  the  M  states  is  governed  by  the 
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transition  probabilities  ,  and  in  which  the  transition  intervals, 

T(ij^,  \  independent  samples  from  the  d.  f .  ^4  i  (*  )  • 

Thus,  once  the  initial  state  i^  is  given,  the  bivariate 
distribution  )  determines  the  entire  process,  since  p^^  ■ 

For  simplicity,  we  assume  “  Q* 

Let  \  ^  elapsed  time  since  the  last  transition 

(O  ^  u^  ^  define  the  rcindom  variables 


\  Vi'  'v 

for  all  ka0,l,2,.,.  over  the  appropriate  ranges  of  the  arguments,  a 
and  b  (for  short)  may  be  thought  of  as  partial  rewards  which  are  accumu¬ 
lated  after  a  time  u^  has  elapsed  between  a  transition  from  state  1^^ 
to  i^_^^  ;  it  is  assumed  that  the  Joint  distribution  function  of  a  and 
b  is  known,  and  that  all  its  moments  are  finite  for  finite  T  . 

In  particular,  denote  the  mean  partial  rewards  and  the  second  (cross-) 
moments  of  a  and  b  by; 


Pij(u  I  t)  »  E{a(l,J;u  |  t))  ;  p^(u  |  t)  «  E([a(i,J;u  |  t)]^)  ; 


etc. ,  and  let 


P^(u  I  t)  »  E{a(l,J;u  |  t)  b(l,J;u  |  t)) 


M 


(1) 


^IJ 


(t)  .  Pij(T  It)  ;  .  f  Pjj(t)  dF^j(T)  ;  p^  -  ^PijPij;(£) 

J-1 


and 


P 


M 

i»l 
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be  the  final  transition  reward,  and  its  vai'lous  average  values,  for  all 
moments  of  (l ) .  The  are  the  statioiiary  probabilities  of  the  associated 

Markov  chain  probabilities,  p. ,  . 

^  u 

The  total  reward  of  type  A  earned  in  [0,t]  is  defined  eis: 


NW-l 

■  L  Sc+l'  * 

k^O 


(5) 


S(t)+l'  \(t)  l'^^^N(t)'  ^N(t)+1^^ 

where  N(t)  -  sup(k  ^  0:  ^  t)  is  the  number  of  transitions  in  [0,t], 

(if  N(t)  »  0,  the  first  term  in  ,  is  iero).  Thus,  the  total  reward 

is  the  sum  of  all  the  total  transition  rewards  accumulated  during  N(t) 
transitions  plus  the  partial  reward  earned  during  the  excess  time,  A 
similar  expression  holds  for  D.  (t)  . 

We  shall  denote  the  mean  and  second  (cross- )moment8  of  the  total 
rewards  A  and  B  by : 

I^(t)  -  E(i^(t))  ;  I^(t)  -  E([Aj^(t)]^)  ;  I^(t)  -  E{Aj^(t)B^(t)) 


(4) 


etc. 

It  is  straightforward  to  calculate  the  Joint  distribution  of  A(t) 

and  B(t)  from  the  Joint  d.  f,  of  a  and  b  .  However,  in  this  paper,  we 

shall  concentrate  on  the  limiting  forms  of  the  means  and  (co-)varlances 

of  these  total  reweu-ds,  as  t  oo  .  These  are  important,  since  it  is 

well-known  that  in  most  "regular"  cases  of  an  M,R.  P.  the  limiting  Joint 

distribution  of  several  additive  functions  such  as  A(t)  is  the  multi- 

[4][6] 

variate  normal  .  Thus,  knowledge  of  the  dominant  terms  in  the  means 
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and  covariances  gives  a  very  good  approximation  to  the  Joint  d,  f.  for 
long  observation  times;  and,  It  is  these  terms  which  are  useful  in  dynamic 
programming  of  an  M.R. P.  with  infinite  horizons'"  ■*. 

We  shall  restrict  our  attention  to  the  case  in  which  all  the  states 
in  the  imbedded  Markov  chain  are  jKJsitive  recurrent,  and  in  which  the  first 
two  moments  of  the  first-passage  time  d.  f.  s,  G.  ,(*  )>  are  finite;  it 

1 J 

is  also  convenient  to  assume  that  all  the  (•  )  are  non-lattice, 
although  this  is  net  restrictive  if  the  limits  are  defined  correctly^ 

For  references  to  cases  in  which  the  limiting  distribution  may  not  be 
normal,  see  Reference  6. 

The  dominant  term  and  the  next  term  (here  called  the  "gain  (rate)" 
and  the  "bias",  respectively)  have  previously  been  found  for  the  mean  of 
A(t),  as  t  -►  «o  In  Reference  4,  MILLER  finds  also  expressions 

for  the  dominant  covariance  term  for  semi -Markov  processes  ■  0,  all  i) 

with  Ma2  and  3  states.  In  Reference  5,  ITKE  expresses  the  dominant 
covariance  term  for  general  M,  R.  P.  s  by  finding  a  closed  form  for  the 
second  moment  of  that  portion  of  A(t)  between  successive  returns  to 
a  given  state;  the  results  are  expressed  in  terms  of  a  renewal  function 
for  an  associated  M.  R.  P.  with  an  absorbing  state. 

The  main  contribution  of  this  paper  is  expression  of  the  dominant 
covariance  term  as  an  explicit  function  of  the  bias  term  of  the  mean 
rewards,  -  that  is,  in  terms  of  the  first  two  moments  of  the  first-passage 
d.  f.  s.  This  expression  also  includes  the  excess  partial  reward  in  (3), 
which  is  not  considered  in  the  othei-  references.  As  special  cases, 
explicit  formulas  are  found  for  the  variances  and  covariances  of  the  nianber 
of  times  a  given  state  is  entered,  and  for  the  occupation  times  of  a 
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given  state. 

Special  results  which  are  needed  for  this  development  are  given  in  the 
Appendix. 


MEAN  TOTAL  REWABD 


It  is  easy  to  show  that  the  mean  of  (5)  is  given  by  the  renewal 
equations ; 

M 

<(t)  .  -  a^(t)  ^  Y  r  (5) 

for  all  t  ^  0  ,  and  for  all  starting  states  h  .  A  similar  expression 

holds  for  the  mean  total  reward  of  type  B  . 

The  transient  term  in  (5),  ,  is: 

M 

j»i 


which  we  define  for  all  moments  of 

of  those  moments,  we  have  11m. 

^  t  -*  » 


dt 


and 


(1).  Under  the  finiteness  assumptions 
cJjj(t)  ■  0  ;  we  set: 

M 

TTjSj  ,  (7 ) 

J-1 


which  ore  finite. 


during  [0,t],  and  define  the  renewal  function, 

m 

and  its  (LaPlace-Stielt jes)  transform,  “ 

From  equation  (A. 7)  of  the  Appendix,  the  (LaElace)  transform  of  (5)  can 
then  be  written  as : 


M^j(t)  -  E[Nj(t)  I  Iq  «  h]  , 

pOO 

exp(-st)  . 
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(8) 


M  M 

J-1 

where  the  tilde  is  used  for  the  transform  of  the  appropriate  functions. 
It  is  well  known^^^  that: 


(2) 

iLL. 


(9) 


with  the  limit  possibly  being  taken  in  the  Ceskro  sense,  (in  (9), 

is  the  moment  of  the  first -passage -time  d.  f. ;  see  the  Appendix). 

In  terms  of  the  renewal  function  bias,  ,  It  is  then  possible  to 

[7] 

uirj  a  theorem  due  to  Smith  ,  or  the  usual  Tauberlan  limit  theorems 

[2] 

of  traaisform  calculus  to  ehow  that  : 


llm^  ’  ''h 

with  the  reweu’d  gain  rate  being  given  by: 

I 

J-1 

and  the  reward  bias  as: 


M  M 

'^h  "  Z  %^J  "  Z  • 

J-1  J-1 


(10) 


(11) 


(12) 


A  similar  result  holds  for  the  asymptotic  form  of  the  mean  total  reweord 
of  type  B  .  Notice  that  neither  of  the  domi:\ant  terms  in  (9)  or  (lO) 
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depends  upon  the  starting  state; 


that  the  imbedded  Meurkov  chain  is 

.  .  ^  t5][6] 

desired 


this  is  a  consequence  of  the  assumption 
ergodic,  and  may  be  generalized,  If 


THE  DOMINANT  COVARIANCE  TERM 


By  the  use  of  arguments  similar  to  those  which  led  to  (^),  one  may 
show  that  the  second  (cross- )raoment  of  total  reward  Is  given  by  the 
renewal  equations: 


for  all  t  ^  0,  and  all  starting  states  h  , 

By  taking  the  transform  of  (15)  and  using  (A.  7)  and  (8),  one  may 

then  find  an  explicit  expression  for  1^(8 ),  In  terms  of 

and  the  various  rewards  similar  to  (8);  we  shall  not  reproduce  it  here 

since  it  Includes  at  least  six  rather  complex  terms. 

The  procedure  is  then  straightforward,  although  tedious;  the  limiting 

AB,  V 

forms  of  each  of  the  terms  are  then  examined,  and  (t)  is  found  to 

2 

be  asymptotically  of  the  form  k^t  +  k^t  +  k^  +  o(l)  .  The  limiting 
form  of  >  obtained  from  (lO),  is  then  subtracted,  and  the 

quadratic  terms  cancel.  The  dominant  covariance  term  (l4) 


AB 

=llm 


tr*  00 


11m  E 

tr>  00 


u  /t 
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Is  then  found  to  be,  after  a  great  deal  of  algebraic  manipulation,  and 
use  of  the  formulas  of  the  Appendix: 


AB 


M  M  M 


1 1 1  Vij 


< 


l=lj»lk^ 


(15) 


.A  B  A 
.  P  >  >  P  ) 

V 


for  all  h  ,  with 


M  M 


(16) 


1«1  J-1 


and 


M  M 


V  a 


II-. 

ini  J=1 


P  V 

^IJ  IJ 


;  -  e{t(i,j)) 


(17) 


As  might  be  expected  on  intuitive  grounds,  ■  C 

of  the  initial  state  h  , 


AB 


is  Independent 


A  somewhat  simpler  expression  results  if  we  substitute  the  appropriate 
gains  and  biases  from  (ll)  and  (l2); 


.AB  1 


M 


AB 

P  + 


V  V  r  A  B  BA, 


IWL 


A.  B  -Bv  B,  A  Av 
g(S  -  X)+g(S  - 


(18) 


In  either  case,  the  correct  dominant  term  for  the  variance 


is  obtained  by  setting  B  equal  to  A  . 
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U1743LR  OF  VISITS  MJD  OCCUPATION  TIMES 

As  important  special  cases,  let  us  consider  the  asymptotic  means 


and  (co- )vau’ianccs  of:  the  number  of  times  a  state 


is  visited 


in  [0,t],  N.(t)  ;  and  the  total  occupation  time  of  a  state  J  in  [0,t], 

T,(t),  defined  as  the  total  time  for  which  state  J  is  the  last  state 
J 

visited.  For  definiteness,  we  shall  consider  only  states  1  and  2, 
although  the  results  can  clearly  be  extended  to  any  (sets  of)  states. 

The  starting  state  is  always  h  , 

We  have  already  Indicated  that: 

llm^  |(E[N^(t)  I  Iq  =  h]  =  M^(t))  -  (7r^t/v)j  +  o(l)  (19) 

where  we  have  used  the  fact  that  M-  ^  =  v/tt.  ,  (A, 2),  A  similar 

0  J  J 


formula  applies  for  state  2. 
It  is  easy  to  show  that : 


(20) 


where 


lim^  (E[T^(t)  I  i^  a  h]  -  (tt^v^  t/v))  =  co^v^  -  (77^ V2v)+  o(l) 


=  ^P^jE([T(l,j)f)  (k»0,l,2,5...) 


To  find  all  the  (co- ) variance  terms,  we  substitute  in  the  appropriate 

h  Ti 

terms  for  the  mean  rewards.  (For  example,  I  ^ij^^  I  '^ )  “ 

^^1  '^1  (2) 

6^^t  ,  for  all  J  ,  and  all  0  <  t  ^  t);[S  =  0  ;  S  -  7T^V^  72  ;  A  = 

*^1  (2) 

^l\  *  ^  ”^1^1  '  ^  find: 

N  N  ^ 
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c  •;l»A.-V-.-{  ■  ,a  } 


T  ,  r 

c  ^  ^  =  r  ITT. 


V  1V2LI'ij''ij“j2  * 


Vil 


T  T 
,  1 1 


vK''iIpij''ij“ji  -  vf’  * 


^■^2  1  f  V  \ 

=  71V2^12  W2fjl - - ; 


N  T 
,  1  1 


M 


IJ^Jl 


To  obtain  the  above  forms,  some  formulas  of  the  Appendix  must  be  used, 
particularly  (A.  12): 


V 

j  IhAj 


Formiila  (22)  above  is  well-known,  but  it  is  believed  the  others  are  new. 

For  a  2-state  semi -Markov  process  (p^^^  ■  Pgg  ■  O)  ,  formulas 
(22),  (24),  (25),  and  (26)  agree  with  MILLER^^^,  The  others  eu:e  not  given 
by  him. 


MARKOV  CliAINS  WITH  S  INGLE- IRDEX  REWARDS 


Consider  a  dlscrete-peLrameter  Markov  chain  in  which  the  rewards 
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depend  only  on  the  state  entered,  i.e,  p  (t  |  t)  ■  p  ,  for  all 

1 J  1 

0  <  t  ^  T  =  1,  and  all  J  .  Then  (l  5)  becomes: 


„AB  AB 
C  p 


M  MM 

AB  VV  /AB  BAv 
-  2  l^Tj^PlPl  +  1  l’'l“lj(PlPj  *  PlPj) 


(21) 


i=l 


Inl  JrO. 


where,  of  course,  the  now  have  a  simpler  solution,  since  all  trans¬ 

ition  intervals  have  unity  length.  If  we  make  the  substitution: 


ij 


T  eo 
^i  iJ 


TT  OD 

J  ji 


"l^lj 


(28) 


then 


M 


AB 

i 


ial 


M  M 

A  Bv  V  V  A  B 
Vi’  ^  L  =ij  pj 

i=l 


(29) 


[5] 

which  is  a  slight  generalization  of  a  formvila  in  KEMENY  AND  SNELL 
(Theorem  4. 6.  5. ,  p.  87 ) . 

We  note  that  (9)  does  not  agree  with  KEMErY  and  SI-JELL,  since  their 
limiting  process  is  over  the  integers  k=>C,l,2, , , . ,  and  in  our  notation 
they  obtain: 

and  similarly  for  (lO).  This  affects  only  the  bias  term  . 


NUMERICAL  EXAMPLE 

As  a  numerical  example  of  the  calculation  of  the  variance,  consider 
the  example  of  Reference  2,  in  which  a  two-state  alternating  process 

n  p^  =  1)  represents  a  running  machine  (State  l)  or  one  that  has 
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broken  down.  The  two  (maintenance,  repair)  policies  which  tied  in  gain 
rate,  g  ,  were: 

I.  (cheap  maintenance,  expensive  repair)  for  which 

a(l,2j  u  I  T)  =  lOOu  (0  1  u  <.  T)  ;  =  4  days 

a(2,l;  u  I  t)  =  -100  -  200u  (O  <  u  ^  t)  ;  =  1  day 

and  II.  (expensive  maintenance,  expensive  repair)  which  changes  transition 
(l,2)  rewards  to: 

a(l,2j  u  I  t)  =  84u  (O  ^  u  ^  t)  ;  =  5  days. 

Note  that  these  are  deterministic,  linear  rewards,  giving  simple  forms 

2  (2)  2 

for  (1)  (2)  (6)  (7)  and  (17).  Letting  0^^  =  -  [v^^]  be  the 

variance  of  the  transition  time,  we  find  the  asymptotic  forms  of  the  mean 
reward  (lO)  to  be: 

Policy  I:  Rj^(t)  «  20t  +  I50  -  8  +  22  0^  +  o(l) 

R^{t)  20t  -  1^  -  8  0^2  +  22  +  0(1) 

Policy  II:  R^(t)  -  20t  +  I5I  2/5  -  5  l/5  4l 

R^(t)  =  20t  -  168  1/5  -  5  1/5  a^2  +  iS  1/5  +  o  (1) 

2 

It  is  important  to  note  that  0^^  would,  in  general,  be  different  in 
Policies  I  and  II. 

In  Reference  2,  we  resolved  the  tie  in  gain  rates  for  deterministic 

2  2 

transition  intervals  (all  a.,  =  O)  in  terms  of  the  bias  of  Policy  2.(151t>  150) 

IJ  5 

If  we  now  compute  the  dominant  term  of  the  variance  of  reward,  we  obtain: 

Policy  I:  Var  A^(t)  *  (l,28C  of  +  9,680  of  )t  +  o(t) 

^  12  21 

Iblloy  II:  Var  A^(.t)  z  (682  2/5  a®  +  8,066  2/5  a|^)t  +  o(t) 
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which  are  of  rather  large  magnitude  for  moderate  t  (lOO,say),  and 

exponential  transition  time  d,  f.  s.  If  we  attempt  to  resolve  the  tie 

on  the  basis  of  minimum  variance,  we  see  that  different  values  for 
2 

in  the  two  policies  could  break  the  tie  either  way.  Of  course, 
deteiTiinistic  transition  times  give  zero  for  the  dominant  variance  term. 

CONCLUSION 

As  indicated  in  the  beginning,  it  is  possible  to  extend  these 
results  to  more  general  M.R.  P.;  however,  it  is  clear  that  soon  certain 
terms  retain  their  dependence  on  the  Initial  state,  or  cease  to  exist. 

It  is  also  possible  to  find  the  next  ("bias")  term  of  the  covariance 
explicitly;  however,  this  term  depends  upon  the  initial  conditions,  and 
appears  to  have  little  practical  interest. 

The  primary  application  of  these  results  would  seem  to  be  in  the 
fact  that  variables  ([Aj^(t)  -  f^(t)]/  ^t)  ;  {[B^(t)  -  R^(t)]/  •ft] 
have  a  limiting  multivariate  distribution  with  zero  mean  and  (co-)- 
variances  C^  ,  eto„  For  instance,  this  might  make  certain  problems 

of  estimation  easier.  Another  possibility  might  be  the  selection  of 
"Independent"  rewards  as  linear  combinations  of  other  conflicting, 
covariant  goals,  for  a  given  Markov-renewal  process. 
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APPENDIX 


We  summarize  certain  formulas  on  M.  R.  P.  s  which  are  needed  in  the 
text.  Gi.ine  of  these  formulas  are  from  References  1  and  5.  (in  all  that 
follows  i,j,  =1,2,...M). 

Let  and  (k*0,l,2, .)  be  the  moments  of  the 

transition  interval  d.  f. ,  ^  first-passage  time  d.  f. , 

G,j{.)  ,  respectively;  the  superscript  of  the  first  moment  Is  suppressed, 
and  it  is  assumed  that  the  first  two  moments  are  finite.  We  define  the 
averaged  moments  successively  as : 


E 


J=1 


(k) 


and 


(k) 


M 

I 

1=*1 


TTiV^ 


(k) 


where  the  are  the  stationary  probabilities  associated  with  the 

Imbedded  Markov  chain.  These  two  sets  of  moments  are  related  through 
the  p^j  by: 

M 


^Ij  “  ''l  A^ik^j  (A.l) 

k=l 
k?^J 


(2) 


M 


^ik^ikhcj 


k=l 

k?^J 


M 

V  (2) 

A^ik^^kJ 

k=l 

kj^J 


(A.  2) 


By  summing  (A.l)  and  (A.  2)  when  multiplied  by  ,  the  simpler  diagonal 
moments  follow: 


(v/7Tj) 


(A.  2) 
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Let  K 


(2) 


M  M 


(l/TTj)  +  2  £  XVlk''lAj  } 


(A.  5) 


1-1  k-1 


'  be  the  number  of  times  state  J  Is  visited  In  [0,t], 

and  t  he  renewal  function  M^j(t)  -  E{Nj(t)  j  1^  -  h)  (This  Is 

6.  ,  more  than  the  renewal  function  used  In  References  1,  2,  and  5). 
hj 


By  direct  eirguments: 


M  ^ 


(A.  4) 


k«l 


Define  f(s)  as  the  lAPlace  transform  of  f(t)  ,  or  the  LaPlace- 

Stleltjestransform  of  F(t)  -  f  f(x)dx  .  Then  (A,  J4.)  has  the  treuisform: 

^  0 


M 

r  -  . 


k-1 


(A.  5) 


By  direct  arguments,  the  Indices  under  the  summation  can  be  changed 
so  that : 


M 


M 


(A.  6) 


k=l 


k=l 


for  all  h,  J,  and  s  >  0  .  Denoting  the  corresponding  matrices 
by  dropping  the  subscripts  and  the  transform  argument,  (A.  6)  reads: 

m  09  09  09 

q  m  ■  m  q  ,  and  from  (A.  5)  we  get  the  Inverse  matrix: 


^  09 

[I  -  q]  -  m 


(s  >  0) 


(A.  7) 


with  I  the  Identity  matrix.  (A.  7)  is  particularly  useful  In  solving 
the  renewal  equations  of  the  text. 

It  Is  well  known  that,  under  the  assumption  of  an  ergodlc  imbedded 
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Markov  chain,  non-lattice  i(*  )  vl^Ji  finite  first  and  second  moments,  the 

^  J 

limiting  renewal  function  has  the  form: 


(A.3) 


with 


(2) 


(A.  9) 


(with  lattice  G.  .(•  ),  (A, 8)  still  holds  as  a  CesAro  limit, ) 

^  J 

It  then  follows  from  (A.  ^),  (A. 6),  and  some  Tauber lan  arguments  that: 


1 


M 


(A.  10) 


M 

kal  k«l 


M 

^  -7- 

M-i  ^  Li  111,;, 


P  V 

>>i  1  t_i  hj  kj 

k«l  ^ 


\  V 

*  ^hj  ~  Zj 


^hk\j 


k-1 


M  M 

II-, 

i»«l  J-1 


p  V  0) 

IJ  ij  Jk 


(A.  11) 


(A,  12) 


(A.15) 


These  last  four  formulas  are  believed  to  be  original;  they  are  particularly 
useful  in  reducing  special  forms  of  (15). 

Similar  forms  obtain  for  the  gain,  g  ,  and  bias  term,  w^  of  the 
mean  reward  (lO),  (ll),  and  (12).  We  have: 


-16- 


(A.1U) 

(A.15) 

(A.16) 

(A.  17) 

(A.  18) 


Equation  (A.l6)  Is  used  In  the  pollcy-lmproveiDent  portion  of  an 

fj 

algorithm  for  dynamic  prograimilng  In  a  Markov-reneval  process 
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